Finding Consistent Clusters in Data Partitions
نویسنده
چکیده
Given an arbitrary data set, to which no particular parametrical, statistical or geometrical structure can be assumed, different clustering algorithms will in general produce different data partitions. In fact, several partitions can also be obtained by using a single clustering algorithm due to dependencies on initialization or the selection of the value of some design parameter. This paper addresses the problem of finding consistent clusters in data partitions, proposing the analysis of the most common associations performed in a majority voting scheme. Combination of clustering results are performed by transforming data partitions into a co-association sample matrix, which maps coherent associations. This matrix is then used to extract the underlying consistent clusters. The proposed methodology is evaluated in the context of k-means clustering, a new clustering algorithm voting-k-means, being presented. Examples, using both simulated and real data, show how this majority voting combination scheme simultaneously handles the problems of selecting the number of clusters, and dependency on initialization. Furthermore, resulting clusters are not constrained to be hyper-spherically shaped.
منابع مشابه
تشخیص اجتماعات ترکیبی در شبکههای اجتماعی
One of the great challenges in Social Network Analysis (SNA) is community detection. Community is a group of vertices which have high intra connections and sparse inter connections. Community detection or Clustering reveals community structure of social networks and hidden relationships among their constituents. By considering the increase of datasets related to social networks, we need scalabl...
متن کاملخوشهبندی ترکیبی مبتنی بر زیرمجموعهای از خوشههای اولیه
Most of the recent studies have tried to create diversity in primary results and then applied a consensus function over all the obtained results to combine the weak partitions. In this paper a clustering ensemble method is proposed which is based on a subset of primary clusters. The main idea behind this method is using more stable clusters in the ensemble. The stability is applied as a goodnes...
متن کاملانتخاب خوشههای اولیه به کمک الگوریتمهای هوشمند برای مشارکت در خوشهبندی ترکیبی
Most of the recent studies have tried to create diversity in primary results and then applied a consensus function over all the obtained results to combine the weak partitions. In this paper a clustering ensemble method is proposed which is based on a subset of primary clusters. The main idea behind this method is using more stable clusters in the ensemble. The stability is applied as a goodnes...
متن کاملQuality Scheme Assessment in the Clustering Process
Clustering is mostly an unsupervised procedure and most of the clustering algorithms depend on assumptions and initial guesses in order to define the subgroups presented in a data set. As a consequence, in most applications the final clusters require some sort of evaluation. The evaluation procedure has to tackle difficult problems, which can be qualitatively expressed as: i. quality of cluster...
متن کاملCluster Analysis Through Model Selection
Clustering is an important and challenging statistical problem for which there is an extensive literature. Modelling approaches include mixture models and product partition models. Here we develop a product partition model and search algorithm driven by Bayes factors from intrinsic priors. The priors we develop for the partitions, and the number of clusters in the partition, lead to finding par...
متن کامل